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Abstract —Current research into discovering important inter¬ 
mediate nodes in a network suspected of containing criminal 
activity, is highly dependent on network centrality values. Be¬ 
tweenness centrality, for example, is widely used to rank the nodes 
that act as brokers in the shortest paths connecting all source with 
all end nodes in a network. However both the shortest path node 
betweenness as well as the linearly scaled betweenness can only 
show rankings for all the nodes in a network, and not for just a 
subset of source nodes. In this paper we explore the mathematical 
concept of pair-dependency of a source on intermediate nodes, 
adapting the concept to criminal relationships and introducing 
a new source-intermediate reliance measure. To illustrate our 
measure, we apply it to rank the nodes in the Enron email 
dataset and the Noordin Top Terrorist networks. We compare 
the reliance ranking with Google PageRank, Markov centrality 
as well as betweenness centrality and show that a criminal 
investigation using the reliance measure, will lead to a different 
prioritisation in terms of possible people to investigate. While 
the ranking for the Noordin Top terrorist network yields more 
extreme differences than the Enron email transaction network, 
in the latter the reliance values for the set of finance managers 
immediately identified another employee convicted of money 
laundering. 

Index Terms —Shortest path, betweenness, intermediate node, 
reliance 

EDICS Category: FOR-OTHS, OTH-BGDT 

1. Introduction 

A number of methods have been proposed for ranking the 
important nodes in a network. One such measure, the between¬ 
ness centrality attributes importance to intermediate nodes in a 
path as these nodes are necessary to retain the flow within the 
network ||^. However, given a node of interest, u, betweenness 
centrality cannot rank the nodes important only to u, instead 
ranking the nodes with regards to ensuring flow in the network 
as a whole. In this paper, we present a new measure, the 
reliance measure, that achieves this aim of ranking nodes based 
on their importance to a given node. Knowing one suspect, our 
reliance measure would allow the creation of an ordered list of 
trusted connections. Thus this method of ranking nodes would 
enable an investigator to prioritize her search for criminal 
connections. Our research has the speciflc aim of aiding the 
investigation of money laundering crimes, where it is often 
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difficult to identify influential entities with respect to the main 
source of illegal money by using only data mining techniques 



To show the versatility of our measure, we apply it to rank 
nodes in the Enron dataset GD and the Noordin Top Terrorist 
networks GD Our reliance measure is able to show clear 
differences in the ranking of particular Enron employees as 
well as within members of the terrorist network making it easy 
to pick important people for further investigation. In contrast, 
the betweenness centrality ranking of the same nodes does 
not indicate any nodes of interest. Indeed, in our experiment, 
using the set of Enron finance managers, the reliance ranking 
highlights an employee who was not a finance manager, but 
was convicted of money laundering. 

The most well known method of ranking intermediate 
nodes in a network is betweenness centrality. The betweenness 
centrality of a node is a measure that computes the number 
of geodesics (shortest paths) going through that node. In a 
network, the node that appears the most number of times in 
the shortest paths linking every pair of nodes or components 
acts as a broker or intermediary | [35| , and has the highest 
betweenness centrality value. Researchers, including Xu and 
Chen pTI , and Morselli have used betweenness centrality 
to identify a gatekeeper in criminal networks while others, 
|T^ , |T9| , have used it to identify potential persons of influ¬ 
ence in these networks. Cantanese et al. GD- for example, 
utilise multiple network metrics in their log analysis tool to 
identify key members in a criminal network, in particular, 
using betweenness centrality to show the communication con¬ 
trol of one node over other nodes. In all of this research 
betweenness centrality is directly used to identify an important 
node that provides the communication path to many other 
nodes. However, using betweenness centrality and/or degree 
centrality is insufficient when identifying important nodes in 
a money laundering network. 

Money laundering networks are inherently different to other 
criminal networks, in that they may have many members who 
are important to a suspect but the nature of the crime dictates 
that these members are hidden or inactive (not possessing a 
high centrality value). Research, for example and p9| , 
shows that the node with highest betweenness centrality could 
also have the highest closeness as well as highest degree 
centrality. Both of these latter attributes point to nodes that 
are highly visible. Thus, using betweenness centrality alone 
is an impractical way of picking an important person in a 
money laundering network. In addition, in money laundering, 
a suspect could rely more on a person at quite some distance 
from them. We address both of these issues in this paper. 
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Our research, thus, starts with the idea of dependence. Re¬ 
searchers have approached the task of finding influential nodes 
by combining the concept of dependence with betweenness, 
where the dependence of node i on node j in a network 
reflects in some way the influence of node j on node i (27) 
Dependency can be said to exist when there is information 
flow between a source and a destination. For example, while 
identifying important people in the Enron email dataset, Shetty 
et al. p 6 | characterised email link dependency in various 
ways; an email was marked as dependent on another when 
both the emails appeared within the same time frame, a major 
part was copied from one email to another, and the received 
email was then forwarded, or when the links in the email 
were based on a particular event. The authors based these 
characterisations on intuition and domain knowledge which 
makes it difficult to interpret the relationships, discovered by 
them, between individuals. 

In this paper, we propose a new dependency formula to 
calculate the dependence of a source node on an intermedi¬ 
ate node. We name this dependency, reliance. The reliance 
formula is used as a tool to build a source-intermediate node 
reliance measure algorithm that can measure the suspicious 
sub-network that contains the criminals and/or suspects and 
their associates. The reliance values are used to rank the 
nodes upon which a criminal relies. This could then allow 
the identification of nodes that are relied on by a group of 
known suspects and thus progress the criminal investigation. 

The commonly deployed centrality measures, including 
betweenness centrality, compute the central value of a node by 
summing the values over all source nodes in a network. Our 
reliance formula focuses on a particular source node and on 
the intermediate nodes in the shortest paths from this particular 
source node to every possible end node in the network. Our 
interest is in the relationship between a specific source node 
(either a criminal or suspect) and its intermediate nodes be¬ 
cause an intermediary carrying information tends to hide illicit 
activity either on purpose or inadvertently p0| . Campbell 
|T0| mentions in his paper that organized crimes such as 
money laundering, smuggling, and trafficking of antiquities 
can be linked to criminals who have multiple associations. 
Thus, a mafia member who acts as an intermediary in a mafia 
endeavor could escape criminal charges, as, most commonly, 
identification is done by computing the central value of a node; 
summing over all possible source nodes in the network, and 
the intermediary may not stand out in this manner. 

We illustrate the use of our reliance measure by applying it 
to the Enron email transactions (H), obtaining a reliance sub¬ 
network to aid a money laundering criminal investigation. His¬ 
tory shows that some senior officials at Enron were involved 
in accounting irregularities Q leading to an investigation 
of various internal business and financial activities including 
the partnerships set up by some finance managers. This was 
followed in early January 2002 by an inquiry into other 
criminal activities involving many Enron executives |T4| , and 
finally to the conviction of 10 individuals of money laundering 
In | [3T| , we built a shortest path network search algorithm 
(SENSA) using shortest paths combined with two centrality 
measures; eigenvector centrality and betweenness centrality. 


We applied this SENS A | [30| , | [3T| to the Enron dataset, 
extracting a sparse and more manageable network of people 
for further criminal investigation. The method in | [3Q| began 
with the identification of a list of suspects (the algorithm 
feed) to extract a sub-network. Here we use the new reliance 
measure proposed in this paper to calculate the reliance values 
and rank each intermediate node of each suspect in this 
extracted network. For the purpose of illustration, we com¬ 
pare our reliance ranking with three other ranking measures; 
betweenness, Google EageRank and Markov centrality. 

The rest of the paper is organised as follows: section |n| 
contains the definitions and mathematical terms used, while 
section 111 ^ describes the proposed method of calculating the 


reliance of source nodes on the intermediate nodes. In the 
section following this, we compare the pair-dependency for¬ 
mula given by Freeman pT| , Brandes 0. and Geisberger et 
al. p3| with our reliance method as well as quantifying the 
difference between the reliance measure and Geisberger et al.’s 
formula. Section |V] describes the two datasets; the Enron email 
transactions and the Noordin Top Terrorist networks used in 
our experiments. In section |V^ we apply the reliance measure 
to these two datasets to show the difference in ranking between 
the reliance measure and the Brandes and Geiseberger et al. 
measures, as well as comparing the reliance rankings with 
those found using both the Markov walk-path and the Google 
Eagerank algorithms. Section |VII| details the prioritisation of 
nodes for the purpose of a criminal investigation. Finally we 
give the conclusion. 


H. Ereliminaries 

Since our reliance formula is closely related to the between¬ 
ness centrality measure, we will start with the necessary graph- 
theoretic terminology | [25| , and the three main definitions of 
betweenness; Freeman ||^, Brandes and Geisberger et al. 

For more details see Brandes ||7]| and Newman 1^ . 

Let G = {V^E) be a graph where V = {^ 1 ,^ 2 , •••} is the 
set of vertices (also called nodes) and E = {ei, 62 ,...} the set 
of edges (representing the connections between the vertices), 
with the total number of vertices and edges given by \V\ = 
n and \E\ = m, respectively. An edge that has the same start 
node and end node is called a self-loop or a loop. If more than 
one edge is associated with a pair of nodes, these are called 
multiple edges. For our purpose, we exclude all self-loops and 
consider multiple edges as one edge. 

A path is a sequence of edges that connects multiple nodes 
p^ . Given a path (5, t), we call s the source node and t the 
destination, end node or target node. In between the source and 
the target, lies the alternating sequence of nodes and edges, 
for instance, s, ei(s,'T’ i),'T’i, e 2 ('r’i,'^’ 2 ) 5 '^ 2 , •••5 /)? L that 

make up the path (s, t). Here e{u^ v) denotes the edge connect¬ 
ing nodes u and v. In the graph G, the length of an (s, t)-path 
is the number of edges it contains, and the distance, /i(5,t), 
from 5 to t is defined as the minimum length of any (5, t)-path 
if one exists and undefined otherwise |[7|. Let the number of 
shortest paths from 5 to t be given by cFst, and let be 

the number of shortest paths from 5 to t that pass through v. 

Definition 1 (Freeman pT|). The pair-dependency 5st{v) of 
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a pair of nodes s and t on an intermediate node v is the 
proportion of the shortest paths from s to t that contain v, 
that is: 


Oat\y) = - ( 1 ) 

^ st 

The betweenness centrality of v is then the sum of all such 
pair-dependencies (D 

BC{v) = Y. (2) 

In 2001, Brandes introduced an algorithm for computing 
betweenness centrality in a network. 

Definition 2 (Brandes’ Algorithm ^). The dependence of s 
on an intermediate node v is given by: 

^s.{v)= V + (3) 

(T Q'll) 

w.vEPsiw) 

Here {w : v G Ps{w)} is the set of all nodes w where v is 
an immediate predecessor of w in a shortest path from s to 
w, that is V G Ps{w). 

According to Brandes ||^, Ss^{v), the dependence of s on ^ 
is positive, that is 6s^(v) > 0 only when v lies on at least one 
shortest path from s to t and on any such path there is exactly 
one edge {v,w} with v G Ps{w). The Brandes’ algorithm 
has been used among other things, to identify the highest 
betweenness of an edge that separates two communities | [38| . 

In 2008, Geisberger et al. applied a linear scaling to 
Brandes’ algorithm by introducing the length function (unit 
edge weight) p3| thus obtaining a better approximation for 
betweenness centrality on large networks without overestimat¬ 
ing the values for the nodes near a pivot or parent node which 
was the case with Brandes’ algorithm 

Definition 3 (Geisberger et al. p3|). Given a shortest path 
from source s to a node w with node v a predecessor of w 
on this path, the length function is the ratio of the distance 
pi{s^v) of V from s to the distance of w from s, gi{s^w). Thus, 
the dependence of s on an intermediate node v changes to: 


Ss*(y) 


E 

w:v£Ps (w) 


asv 


X (1 + 6s ^('^))] 


(4) 


While Brandes’ algorithm gives good exact results for small 
networks, often it is not possible to get exact results in 
reasonable running time for large networks. With this change, 
Geiseberger et al., could apply betweenness centrality to real 
world situations such as choosing improved highway-node 
routings | [23| , p4| . 


III. The reliance measure 

As mentioned before, the betweenness centrality algorithms, 
given in the previous section, cannot calculate dependence 
for particular source nodes. In our proposed reliance formula, 
we start with a specific set of source nodes and 

calculate the reliance (a.k.a. “dependency”) value for each 


intermediate node v that occurs on paths starting from each 
Si to all possible end nodes t. Thus, we consider all shortest 
paths from a particular source s to all possible end nodes t 
where s t, and the source node s comes from a specific 
set of interest, rather than all possible nodes as is done in 
equations and 0- 

In the first part of our formula, we calculate the proportion 
of the shortest paths linking source s to all nodes t, that contain 
V (equation ([Tj)), and call it the importance rate IR(s,t) (^)- We 
name the pair dependency of equation O as the importance 
rate of v as it indicates how often the node v is relied on by 
s to complete a path from s to a destination t in proportion 
to all paths from s to t. 

The second part of our formula gives a trust value that the 
source s places on v to pass messages to any t along the 
shortest paths. This trust value allows the measuring of trust 
between nodes that are far away from each other. The idea that 
someone has to trust a person much further down the chain of 
communication is an important concept in money laundering. 

Verbiest et al. p9| in their research incorporating path 
length and trust aggregation declare that the shorter the dis¬ 
tance from source s to i;, the more the source s trusts v. 
Similarly, De Meo et al. (H) design an algorithm to compute 
edge centrality using k-path length with the assumption that 
the infiuence between two nodes reduces when the distance 
between them increases. Indeed, a common way to start a 
criminal investigation process is to identify the closest node 
to a criminal or source because the shorter the distance from 
the source, the higher the chances the node is the source’s 
subordinate | [3^ . 

This is not the case with money laundering. We claim 
instead that the further away an intermediate node is from 
a source node, the more the source needs to trust that inter¬ 
mediate node to pass on a message, in the case of money¬ 
laundering. One of the stages of money laundering is the 
layering of money where the money is moved to different 
channels or banks through intermediaries such as corporations 
or trusts. The criminals are consequently connected to multiple 
bank accounts that are, in turn, connected to other people. The 
intermediaries are different people in the middle who help 
to channel the money to these different destinations so as to 
obscure its illegal source. 

Thus, the layering process in money laundering involves 
multiple transactions of money through various channels 0. 
0. In such a money layering process, money below the thresh¬ 
old is distributed to different financial institutions or accounts. 
This contributes to the growth in the length of the paths that are 
used to transport the money from a source to a destination. The 
source uses longer and longer sequences of channels to divide 
and distribute smaller amounts of money making it more and 
more difficult for law enforcement authorities to identify the 
infiuential people 0- Clearly, this upholds our claim that 
in a crime such as money laundering, the source of the illegal 
funds has to place more trust on a person further down the 
chain. 

The trust formula defined below is able to measure this 
increased trust. 
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Definition 4 (Trust). Given a graph G = (V, E), s a source 
node and v an intermediate node on some path {s^t) from s 
to an end node t, the trust of s on v, relative to the path (s^t) 
denoted by given by: 

T(s,t) (v) = for s,s (5) 

where p.{s, u) is the minimum distance (number of edges) from 
s to u along the path (s, t), u G V, if one exists and undefined 
otherwise. 

We illustrate the trust concept using a representative money 
laundering network. 

Example 1. In Figure let 1 be the source node and 8 the 
destination or end node. A path from source 1 to destination 
8 is: 

1 ^ 2 -> 3 -> 5 ^ 7^8 



Fig. 1: A representation of a small money laundering network. 

Node 1 could be considered the source and node 8, the destination 
of the money. The floating-point numbers near each node represent 
the trust placed by source node 1 on each intermediate node from 
the set {2,3, 5, 7} in the path 1^2^3^5^7^8. 

Suppose Figure is a representation of a money laundering 
network and node 1 was a money laundering suspect. An 
example of when this might be the case is the layering of 
illegal money. Hence consider Figure 1 as representing the 
layering of illegal money within a money laundering syndicate, 
with node 1 the source of the money being laundered. 

In such a scenario, the source node (node 1 in Figure 
needs to trust node 7 (at distance 4) more than node 3 (at 
distance 2) to pass on the message (or money) to node 8. This 
difference is reflected in the trust value, with r(i,8)(7) = I 
while T(i,8){3) = 

However, it is insufficient to just have a trust value, as the 
number of paths that exist from a source and a destination 
would also play a role, as would the number of paths that the 
particular intermediate node lies on, to work out the reliance 
of the source on this intermediary. Thus, we incorporate the 
importance rate and the trust value of an intermediate node, 
to get the reliance of source s on an intermediate node v. 

We focus on a particular source, s, that we call a suspect 
and our motive is to calculate the reliance of this suspect on 
the intermediate nodes that reside along all paths (s,t). 


Definition 5 (Reliance). Given a graph G = (V^E), with 
\V\ = n, a source node s and an intermediate node v 
on some path from s to a particular end node t, given by 
(sfi), the reliance of s on v along the path (sfi), 
for V e is the product of the impor¬ 
tance rate and the trust 

( 6 ) 

Equation gives the reliance of s on v along a particular 
path (sfi). Hence, the total reliance of source s on v over all 
paths from s to all possible end nodes t, is: 


^ ^ in — 2 ) 

{t\vCL{s ,i) 

Since \V\ = n, that is, there are n vertices in the graph, 
and since the start node s and the intermediate node v are 
flxed and is taken for all t, where t s ^ v ^ t, 

there are (n — 2) possible values for t, thus, we normalise the 
reliance value, Rs{v) with (n — 2). 

IV. Comparison between different dependency 

TECHNIQUES 

We compare our reliance measure to the betweenness cal¬ 
culations given by Freeman (equation Q), Brandes (6) 
(equation and Geisberger et al. (equation Q), and 
show that even though all four formulae use the dependency 
given by equation Q with, in addition, Geisberger et. al’s 
formula using a version of the length function, there are 
fundamental differences in these formulae and that the reliance 
formula is best able to do the job of ranking nodes with respect 
to a particular source. 

Example 2. Comparison of reliance with Freeman J27] /, 
Brandes and Geisberger et al. p3] / betweeneness. We use 
the graph in Figure to compare the reliance of node 1 on 
intermediate nodes of the network with the values assigned to 
the intermediate nodes by the three dependency measures, as 
given by equations and 0 respectively. The results 

are shown in Figure 

The reliance (dependency) of node 1 on itself is zero. 
Similarly, the reliance and dependence value of node 1 on 
node 8 are zero because there is no shortest path from source 
1 to any possible end node that contains 8 as an intermediate 
node. The biggest difference that we can see from the graph 
in Figure is in the dependence of node 1 on node 3. Both 
Freeman and Brandes give the dependence value of node 3 
as less than that of node 2, whereas Geisberger et. al puts 
the dependence value of node 3 as more than that of node 
2. Similarly the reliance measure has node 1 relying more on 
node 3 than on node 2. The network in Figure is undirected 
and node 3 is at the crucial position of separation between 
two sets of nodes. Removing node 3 cuts the information flow 
from node 1 and node 2 to the other nodes. Thus, based on 
the position of the nodes, the dependence and the reliance 
value of node 1 on node 3 should be intuitively higher. Both 
Geisberger et. al’s technique and reliance measure reflect this. 
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Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 


Fig. 2: The result of applying the different dependency formula 
and the reliance formula on the network shown in Figure Here, 
for the purpose of comparison we choose node 1 as the source as 
and take all shortest paths from this source node to all possible end 
nodes. The value of reliance of node 1 on the intermediate nodes is 
compared to the dependency values given by eqns. 0, 0 and 0. 


Even though the ranking of the nodes in the graph in Figure 
is the same for Geisberger and reliance, the technique by 
Geisberger et. al is different from our reliance formula as the 
reliance calculation focuses on the reliance of a specific source 
on other nodes whereas, in general, Geisberger et. al’s formula 
gives betweenness by considering all sources. Consequently, 
as the graph gets bigger, the estimation by Geisberger et. 
al ’s technique gets much larger than our reliance measure; 
something that is clearly illustrated in the next example. 

Example 3. Difference between the reliance formula and 
Geisberger et. aVs formula: We use a slightly larger graph 
(Figure to illustrate the growth in dependence using Geis- 
berger’s formula when compared to the reliance calculation. 



Fig. 3: A sample graph. This sample graph is used to compute 
the dependency of node 1 on other nodes. 

Figure shows a network with 8 nodes. We calculate the 


dependency of node 1 on other nodes using Geisberger et al.’s 
technique (equation (0). 


■s —1*V ) 2-^w:vEPs(w) fx{s,w)\-crs 


X (1 + 6s*{w))] 




{^{ 


r 

m(s, 6) i 


^]}+ 2 { 


/Lt(s, 7 ) 


m(s,6) 


fl} + 5{ 

Sl} + 2{ 


r c7_^2 
m ( s , 8 ) \-as8 


I}} 




■CTs? J J 


I c / ^^(s, 2 ) r crg2 1 \ 


(8) 


Thus equation a over-estimates the dependency by 


f /i(g,2) r crg2 ' 
\ ^( s , 8 ) Lctss ■ 


}} 


^ ^ due 


to the repetitive dependency calculation of the source node on 
nodes 6, 7 and 8. Thus the direct use of the Geiberger et al. 
formula for the purpose of calculating reliance would result in 


an overestimation of the dependence values of certain nodes, 
a fact that becomes clear in section |V^ While Geiseberger et 
al.’s formula is a good means of estimating values for large 


networks, our purpose here is to further refine a possible 
network for criminal investigation, and rank nodes for a 


specific set of source nodes. 


V. Datasets 

We use two different datasets in our experiments - the Enron 
dataset and the Noordin Top Terrorist Dataset. 


A. The Enron Network 

In we first divided the Enron email dataset, which 
consist of 1,887,305 email transactions into two groups; emails 
that were sent using only the fields ‘TO/CC’ and those 
emails that also contained ‘BCC’ recipients. Our SPNSA 
implementation in (ID was based on the number of BCC 
recipients needed to identify a trust network, where the focus 
was on emails that contained one or two recipients bcc-ed. By 
their very nature, ‘BCC ’email transactions contain recipients 
that are kept secret p0| . In pQ| , our experimental results 
showed that the undirected BCC email transactions have the 
most number of criminals in the shortest paths network, thus 
further analysis was conducted using only the BCC email 
transactions. In this paper, both the ‘TO/CC’ and ‘BCC’ 
undirected email transaction networks are used to produce the 
reliance sub-networks for all suspects. Through dividing the 
email transactions, we are able to compare the important nodes 
that a suspect relies on, whether or not the connection (email) 
is kept secret (bcc-ed). To form an undirected network, we 
make the broad assumption that an email sent from A to B 
or from B to A implies an undirected relationship between A 
and B. 

The majority, 87.3%, of email transactions in the Enron 
dataset, use the fields ‘TO/CC’, while 12.7% are ‘BCC’ email 
transactions (i.e. they contain bcc’ed recipients). The email 
transactions in this dataset comprise of external and internal 
email addresses where the external email addresses are those 
that do not have ‘@enron’ whereas the internal emails do. The 
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email addresses are the nodes and we give importance to the 
sequence of emails exchanged by the nodes rather than the 
content of the emails. As in | [^ , d), here too the sets of 
source nodes, consisting of Enron finance managers as well 
as a few others who held top posts, were selected based on 
the possibility of having been involved in money laundering 
and are henceforth called suspects, and used as feed to the 
SPNSA algorithm d) We run our experiments using two sets 
of suspects; the first set of suspects consisting of the Enron 
finance managers while the second, larger, set comprises 
of all managers including the finance managers. Both sets 
were collected from a report on the chronology of events 
related to the collapse of Enron Q . 

A manager may be indexed by more than one node if 
he or she has more than one email account. The two email 
transaction groups, ‘TO/CC’ and ‘BCC’, have distinct ID sets 
for the different email addresses, designed as such because, 
somewhat surprisingly, some nodes that exist in the ‘BCC’ 
group do not exist in the ‘TO/CC’ group. The network formed 
using the ‘BCC’ email transaction subset has 19,716 nodes and 
65,532 edges while the network formed using the ‘TO/CC’ 
email transaction subset has 26,027 nodes and 252,863 edges. 
All self-loops and multiple edges have been removed from 
these networks. 


B. The Noordin Top Terrorist Network 

The second dataset that we use to compare rankings is the 
Noordin Top terrorist network (Tt) This dataset is small and 
consists of different types of connections. The first group of 
connections gives the terrorists’ affiliations such as terrorist 
organisations, educational institutions, business and religious 
institutions. The second group contains relationship informa¬ 
tion such as classmates, kin, friends with the third group 
comprising of individuals that provided logistical support 
or participated in training events, terrorist operations, and 
meetings GZ) We take 2 different subsets from this dataset for 
our analysis, the terrorist-friendship and terrorist-classmates. 

In our experiment, we rank the terrorists using source- 
intermediate reliance value. Since it is a small network, we 
do not extract any terrorist sub-network using SPNSA (3TJ 
instead using the terrorist-friendship and terrorist-classmates 
networks as they are. 

VI. Ranking important nodes using the Reliance 
Measure 

As we saw in section ||V| even in small networks, there is 
a difference between the dependence rankings given by the 
betweenness centrality methods, and the reliance measure. In 
this section, we illustrate further the need for our reliance 
measure. 

In subsection |VI-A| for the purpose of illustration, we 
compare the ranking of the nodes in the Enron ‘BCC’ network 
using betweenness centrality and the reliance formula. We also 
compare the reliance value ranking with the rankings obtained 
using a Markov centrality score and the Google PageRank 
method. 


Random walks are used to calculate Markov centrality 
scores | [40| , with the centrality score of a node being calculated 
by first taking the average path length of a random walk 
starting at that node and arriving (for the first time) at some 
other node, and then averaging over all other nodes. Markov 
centrality of the node is then the inverse of the average distance 
between it and every other node | [40| . The PageRank 
algorithm, used by Google to rank important web pages, uses 
the assumption that a page is important when it is linked by 
many pages or if it is linked to many other important pages 
0, 1^ . The mathematical equivalent of this concept is the 
eigenvector centrality measure 


Eor the Noordin Top terrorist network, in section VI-B 


we use these measures to rank a terrorist, his friends and 
classmates to see if there are any differences between reliance 
ranking to other measures. 


A. Comparing reliance node ranking with betweenness cen¬ 
trality, Markov centrality and PageRank on the Enron network 

We use only the Enron BCC Network for this part of the 
experiments. We compare the node rankings resulting from 
using the betweenness centrality proposed by Brandes and 
Geisberger et.al with the results produced using the reliance 
measure on particular subsets of the Enron ‘BCC’ network. 
Since betweenness centrality measures the dependency of all 
source nodes on v, we decided that a comparison was best 
done by comparing the betweenness value of a node v with 
its combined reliance value obtained by taking 

the sum of the reliance of all s in the subset S, where S is the 
set of all finance managers (respectively, all managers), on this 

V in the Enron finance manager (respectively, manager) BCC 
sub-networks. The nodes v are ranked based on the decending 
order of betweenness centrality (Brandes’ and Geiseberger et. 
al) values. 

We start our comparisons with the finance manager sub¬ 
network extracted from the BCC network. Eive employees 
worked as finance managers in Enron between the years 
1990-2001 and were used as the SPNSA’s | [3T| feed. These 
finance managers were Andrew Eastow (686, 687), Sherron 
Watkins (16929), Ben Glisan (1369), Rick Causey (15077) 
and Jeff McMahon (8071) and the sub-network extracted, the 
Enron Einance Manager BCC sub-network, had 30 nodes and 
53 edges. As mentioned before, the betweenness centrality 
measures by Brandes and Geisberger et al. calculate the 
centrality value of v using all source nodes. To compare 
the reliance value with the betweenness centrality measures, 
we take the total reliance of all finance managers, F = 
{686,687,16929,1369,15077,8071}, on each v in the Enron 
finance manager BCC sub-network. Thus, the total reliance 
value for each possible intermediate node v is 
This compares well with the betweenness centrality value for 

V in the case of both Brandes and Geiseberger et al. which is 

{v) for all s in the sub-network, and not just s e F. 

In addition, we normalise the total reliance as well as 
the betweenness centrality values obtained using Brandes and 
Geisberger et al. by dividing each node’s value with the 
maximum value in each set respectively. 
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The comparison of betweenness centrality with reliance 
node ranking for the nodes relied on by the Enron finance 
managers (in the Enron Einance Manager BCC network) is 
depicted in Eigure 



■ Brandes 
Geisbergeretal. 

■ Reliance 


Nodes 


Eig. 4: The Enron finance manager BCC sub-network node 
ranking. The total reliance of all finance managers on each v 
in the Enron Finance Manager BCC sub-network is used for this 
comparison. The intermediate nodes v are ranked based on the 
descending order of betweenness centrality (Brandes) values. There is 
a clear difference in node ranking between the total reliance measure 
and betweenness centrality measures. For example, node 2473 is more 
heavily relied on by the finance managers than 686, where as both 
the betweenness centrality measures rank these two the other way 
around. 


In Figure all three measures identified the same node as 
having the highest value. The interesting point is to note that 
some of the nodes (for example, 11010, 12935 etc.) ranked 
lower by the betweenness centrality measures, are ranked as 
being more important by the reliance measure. Node 11010 
(lfastow@pop.pdq.net), ranked third by reliance, belongs to 
Lea Fastow, who was convicted of money laundering @ (3T) 
The total reliance ranking of the intermediate nodes v is next 
compared with the other two ranking methods, PageRank and 
Markov centrality, in the Enron Finance Manager BCC sub¬ 
network. Again the values are normalised by dividing each 
value by the largest value in the respective set. The results 
are shown in Figure The nodes are ranked based on the 
descending order of PageRank scores. 


IT 
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■ PageRank 
MarkovCentrality 

■ Reliance 


Nodes 


Fig. 5: The Enron finance manager BCC sub-network node 
ranking. The total reliance of all finance managers on each v in 
the Enron Finance Manager BCC sub-network is compared with the 
PageRank and Markov centrality value of v. The nodes are ranked 
based on the descending order of PageRank values. There is a clear 
difference in node ranking, with for example, nodes 3945, 3973, 
10917 and 7974 being ranked similarly by Markov centrality and 
Pagerank method but very differently by the reliance measure. 


The bar chart in Figure shows the differences in node 
ranking between PageRank, Markov centrality and the reliance 
measure. A few interesting differences are visible in this figure. 
Nodes 3945, 3973, 10917 and 7974 are ranked very differently 
by reliance but similarly by PageRank and Markov centrality. 
In contrast, nodes 3983, 687 and 15077 are valued by Markov 
centrality and PageRank but not relied on at all by the finance 
managers (see figure towards the right of the bar chart). 
Finally, 11010 (Lea Fastow) was also not picked out by either 
PageRank or Markov centrality. 

The comparisons are repeated for all the Enron managers, 
a larger algorithm feed, to see again if there is a difference 
in the ranking. A shortest paths sub-network is formed using 
all the managers and named the Enron Manager BCC sub¬ 
network. This sub-network is an bigger undirected graph with 
121 nodes and 314 edges. First the comparison is carried out 
between the reliance measure and the betweenness measures, 
as before, and the intermediate nodes are ranked based on 
descending order of betweenness centrality (Brandes) values. 



Nodes 


Fig. 6: The Enron manager BCC sub-network node ranking. The 

total reliance of all managers on each v in the Enron Manager BCC 
sub-network is compared with the betweenness centrality measures of 
Brandes, and Geisberger et al., with rankings ordered in descending 
order of Brandes’ betweenness. Note that only the nodes with positive 
reliance value are displayed along the A-axis. 

Even more than in Figure]^ Figureshows the difference in 
the ranking between the betweenness centrality measures and 
the reliance measure; the node (17697) picked as the most 
important one by the reliance measure is not picked by either 
Brandes or Geisberger et. al, both of which pick 15932, a node 
regarded as irrelevant by the reliance measure. Moreover, the 
betweenness centrality values for nodes 348, 441 and 9395 are 
almost the same (and negligible) whereas the reliance measure 
shows a different ranking with much higher values. 

Finally we compare intermediate node reliance with PageR- 
ank and Markov centrality in the Enron Manager BCC sub¬ 
network. The result is shown in Figure [7] The nodes are ranked 
based on the descending order of PageRank values. Again 
there is a clear difference between the three rankings. 

From the comparisons done above, we can see that an 
investigation using the reliance measure rather than any of 
betweenness centrality, PageRank or Markov centrality rank¬ 
ings, will lead to different, possibly more relevant, people to 
investigate. For example, Lea Fastow was not in the algorithm 
feed for the experiment ED using either all managers or 
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■ PageRank 
MarkovCentrality 

■ Reliance 


Fig. 7: The Enron manager BCC sub-network node ranking. The 

total reliance of all managers on each intermediate v in the Enron 
Manager BCC sub-network is used for this comparison. The inter¬ 
mediate nodes (on the X-axis) are ranked based on the descending 
order of PageRank values. There is a clear difference in the ranking of 
the nodes using the three different methods. For example PageRank, 
Markov centrality and the reliance measure rank nodes 19075, 6673 
and 17697 as having the highest value respectively. 


finance managers, however, as illustrated in Figure Lea 
Fastow, the wife of Andrew Fastow |[^ and a convicted money 
laundering criminal was heavily relied on by the finance 
managers. 


B. Comparing reliance node ranking with betweenness cen¬ 
trality, Markov centrality and PageRank in the Noordin Top 
terrorist network 
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(a) The terrorist-friendship sub-network nodes ranked using between¬ 
ness centrality by Brandes, Geisberger et al. and reliance. 
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(b) The terrorist-classmate sub-network nodes ranked using reliance, 
betweenness centrality by Brandes and Geisberger et al. 


Four different bar charts are presented here showing the 
comparison of intermediate node reliance with betweenness, 
Markov centrality, and PageRank. The terrorist-friendship sub¬ 
network consists of 61 nodes and 91 edges and the terrorist- 
classmate sub-network has 39 nodes and 175 edges. We first 
isolate the highest reliance of each node on an intermediate 
node in the shortest paths between the former node and all 
other nodes in the sub-network. We then sum over all the 
reliance values for the same intermediate node and normalise 
it with the highest value in the list. We do a similar normali¬ 
sation for the betweenness as well as PageRank and Markov 
centrality values.The results are shown in Figures and 

Both Figures andillustrate the versatility of the reliance 
measure, with very few nodes showing up as being relied on 
by all the members of the group. Thus, for example, taking 
Figure [^b), an investigator could narrow their first detailed 
investigations to just 5 members of the group. 

VII. Identifying crime priority nodes using the 

RELIANCE MEASURE 

The experiment hereafter uses only the Enron dataset and 
the reliance measure to get the persons of interest (a.k.a. 
“crime priority nodes”) of each finance manager and manager 
in both the ‘BCC’ and ‘TO/CC’ network respectively. The 
main purpose of these experiments is to identify the interme¬ 
diate nodes important to the suspects, as ranked by the reliance 
measure. At this point, we do not corroborate the identified 
persons of interest with any published articles or past research. 


Fig. 8: The nodes v are listed on the X-axis, with the Brandes and 
Geiseberger et. al and reliance values obtained by each node depicted 
as the height of the bars, with the nodes ranked in descending order of 
Brandes’ betweenness centrality values. For each graph, nodes with 
zero reliability and with a betweenness (Brandes) value below that 
of the node with the lowest reliability, are not included. 

A. Extracting the suspects' crime priority nodes from the 
Enron 'BCC email network 

We first show the results of identifying the Enron money 
laundering suspects’ important nodes in the ‘BCC’ email 
network based on suspect-intermediate reliance value. We start 
by choosing the Enron finance managers as our source nodes, 
and extract the sub-snetwork using SPNSA. The reliance of 
each finance manager on every intermediate node (t;) in the 
path between that finance manager and all other nodes within 
the Enron Einance Manager BCC shortest paths sub-network, 
is calculated. The intermediate node that each finance manager 
relies on the most is identified. The list of these crime- 
priority nodes reveals 2 money laundering criminals ( ®>(3D) 
as important to each other; Ben Glisan (1369) and Andrew 
Eastow (686). 

Next, we use all managers (including the finance managers) 
as the suspects (SPNSA algorithm feed), to see if new people 
of interest are obbtained. Unlike the Enron Einance Manager 
BCC sub-network, there is no criminal to criminal reliance in 
this network. However, nodes that are the persons of interest 
to the managers include managers Greg Whalley (6673), 
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(a) The terrorist-friendship sub-network nodes ranked using reliance, 
PageRank and Markov centrality. 



■ PageRank 

■ MarkovCentrality 

■ Reliance 


Terrorist Nodes 


(b) The terrorist-classmate sub-network nodes ranked using Markov 
centrality, PageRank and reliance. 


Fig. 9: The nodes v are listed on the X-axis, with the PageRank, 
Markov and reliance values obtained by each node depicted as the 
height of the bars, with the nodes ranked in descending order of 
PageRank scores. For each graph, terrorists with zero reliance and 
ranked by PageRank scores as below the terrorist with the lowest 
reliance, are not included. 


It is possible that more criminal to criminal connections 
would be identified if an investigator picked the best possible 
suspects to inspect (based on corroborative information or 
interviews). The experiments in this section show that the 
reliance measure allows an investigator to find people who 
are close to and heavily relied on by the suspects for further 
investigation. 


VIII. Conclusion 

This paper introduced a new reliance measure to rank 
nodes in a network and therefore identify nodes of interest. 
This reliance measure is different from betweenness centrality 
because the latter calculates the centrality value of a node 
for all sources whereas the former calculates the reliance 
of a given list of specific sources on a node. We compared 
the reliance ranking with other centrality measures, Google 
PageRank and Markov centrality, as well as betweenness 
centrality. Reliance identified a very different subset of nodes 
from those identified by the other measures. We showed that 
the ranking based on the reliance measure could also be used 
to identify the nodes relied on the most by particular persons 
of interest to an investigation. 

Our SPNSA as described in ID is able to produce a 
small and manageable network. In this paper we furthered 
our research and analysed the connections between nodes in 
the network; reliance of one node on another leading possibly 
to the identification of important nodes. This reliance method 
could reduce the time needed to explore a large network and 
hence may speed up an investigation process. It is important 
to note that, prior to the application of the shortest paths 
network search algorithm ID and the reliance formula, it is 
essential to choose the most relevant suspects to a crime. The 
analysis proposed in this paper could also yield more criminal 
to criminal connections by choosing the most applicable or 
appropriate algorithm feed for a crime incident. 


President and Chief Operation Officer, and Lou Pai (11357), 
CEO of Enron Energy Services. 

B. Extracting suspect’s crime priority nodes from the Enron 
‘TO/CC’ email network 

Using the same method as shown above, we extract the 
finance managers’ crime priority nodes from the TO/CC’ 
shortest paths sub-network. Note that some employees’ email 
addresses that exist in the ‘BCC’ network do not occur 
in the ‘TO/CC’ network, for example email address (an- 
drew.fastow@ljminvestments.com) belonging to Andrew Fas- 
tow occurs only in the ‘BCC’ network. Thus, different IDs 
were used in the ‘TO/CC’ network. 

The finance managers 0 used as the SPNSA algorithm 
feed are Andrew Fastow (1472), Sherron Watkins (26577), 
Ben Glisan (2521), Rick Causey (24277) and Jeff McMahon 
(12919). George Wasaff (george.wasaff@enron.com (10351)) 
is the intermediate node that all finance managers rely on the 
most in this network, which is an interesting fact and could 
be the starting point for further investigation. 
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